SuperCAT: The (New and Improved) Corpus Analysis Toolkit
نویسندگان
چکیده
This paper reports SuperCAT, a corpus analysis toolkit. It is a radical extension of SubCAT, the Sublanguage Corpus Analysis Toolkit, from sublanguage analysis to corpus analysis in general. The idea behind SuperCAT is that representative corpora have no tendency towards closure-that is, they tend towards infinity. In contrast, non-representative corpora have a tendency towards closure-roughly, finiteness. SuperCAT focuses on general techniques for the quantitative description of the characteristics of any corpus (or other language sample), particularly concerning the characteristics of lexical distributions. Additionally, SuperCAT features a complete re-engineering of the previous SubCAT architecture.
منابع مشابه
HyperCAT: an extension of the SuperCAT database for global multi-scheme and multi-datatype phylogenetic analysis of the Bacillus cereus group population
The Bacillus cereus group of bacteria includes species that are of significant medical and economic importance. We previously developed the SuperCAT database, which integrates data from all five multilocus sequence typing (MLST) schemes available to infer the genetic relatedness within this group. Since large numbers of isolates have been typed by other techniques, these should be incorporated ...
متن کاملD3 Toolkit: A Development Toolkit for Daydreaming Spoken Dialog Systems
Recently various data-driven spoken language technologies have been applied to spoken dialog system development. However, high cost of maintaining the spoken dialog systems is one of the biggest challenges. In addition, a fixed corpus collected by human is never enough to cover diverse real user’s utterances. The concept of a daydreaming dialog system can solve the problem by making the system ...
متن کاملA new model for Spread Out Bragg Peak in proton therapy of uveal melanoma
In this research, in order to improve our calculations in treatment planning for proton radiotherapy of ocular melanoma, we improved our human eye phantom planning system in GEANT4 toolkit. Different analytical models have investigated the creating of Spread Out Bragg Peak (SOBP) in the tumor area. Bortfeld’s model is one of the most important analytical methods. Using convolution method, a new...
متن کاملVocabulary Lists for EAP and Conversation Students
Despite the abundance of research investigating general and academic vocabularies and developing dozens of word lists, few studies have compared academic vocabulary with general service word lists such as conversation vocabulary. Many EAP researchers assume that university students need to know all the words in West’s (1953) General Service List (GSL) as a prerequisite to academic words (e.g., ...
متن کاملCalculation of Positron Distribution in the Presence of a Uniform Magnetic Field for the Improvement of Positron Emission Tomography (PET) Imaging Using GEANT4 Toolkit
Introduction Range and diffusion of positron-emitting radiopharmaceuticals are important parameters for image resolution in positron emission tomography (PET). In this study, GEANT4 toolkit was applied to study positron diffusion in soft tissues with and without a magnetic field for six commonly used isotopes in PET imaging including 11C, 13N, 15O, 18F, 68Ga, and 82Rb. Materials and Methods GEA...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- LREC ... International Conference on Language Resources & Evaluation : [proceedings]. International Conference on Language Resources and Evaluation
دوره 2016 شماره
صفحات -
تاریخ انتشار 2016